Preference of simple sequence repeats in coding and non-coding regions of Arabidopsis thaliana

نویسندگان

  • Lida Zhang
  • Dejun Yuan
  • Shunwu Yu
  • Zhugang Li
  • Youfang Cao
  • Zhiqi Miao
  • Hongmei Qian
  • Kexuan Tang
چکیده

MOTIVATION Simple sequence repeats or microsatellites have been found abundantly in many genomes. However, the significance of distribution preference has not been completely understood. Completion of the Arabidopsis genome sequencing allows us to better understand and characterize microsatellites. RESULTS Microsatellite distribution was more abundant in 5'-flanking regions of genes compared with that expected in the whole genome, with an over-representation of AG and AAG repeats; there were clear differences from distributions in 3'-flanks and coding fractions, where triplet frequencies evidently corresponded to codon usage. We identified 1140 full-length genes that contained at least one locus of AG or AAG repeats in their upstream sequences, and whose functional characteristics were significantly associated with the repeats. This observation indicates that selective pressure markedly differed in the three transcribed regions, with positive selection of AG and AAG repeats in 5'-flanks close to those genes whose products are preferentially involved in transcription.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Yeast Two Hybrid cDNA Screening of Arabidopsis thaliana for SETH4 Protein Interaction

SETH4 coding sequence with 2013 bp is a member of gene family expressed in gametophytic tissues of Arabidopsis thaliana. This fragment was PCR amplified using Kod Hi Fi DNA polymerase enzyme. This fragment was cloned into pGBKT7 bate vector and transformed E. coli DH5? cells containing vector were selected on LB medium containing Kanamycin. Finally, pGBKT7-SETH4 bate was transformed into yeast ...

متن کامل

Mapping and analysis of Simple Sequence Repeats in the Arabidopsis thaliana Genome

Simple sequence repeats (SSRs) are becoming standard DNA markers for plant genome analysis and are being used as markers in marker assisted breeding. And hence because of its great significance we have initiated this study to analyze complete genome of Arabidopsis thaliana for the prevalence of mono-, di-, tri-, tetra-, penta- and hexa- mer repeats in the coding and non-coding regions of the ch...

متن کامل

Genomic distribution of simple sequence repeats in Brassica rapa.

Simple Sequence Repeats (SSRs) represent short tandem duplications found within all eukaryotic organisms. To examine the distribution of SSRs in the genome of Brassica rapa ssp. pekinensis, SSRs from different genomic regions representing 17.7 Mb of genomic sequence were surveyed. SSRs appear more abundant in non-coding regions (86.6%) than in coding regions (13.4%). Comparison of SSR densities...

متن کامل

A large number of novel coding small open reading frames in the intergenic regions of the Arabidopsis thaliana genome are transcribed and/or under purifying selection.

Large-scale cDNA sequencing projects and tiling array studies have revealed the presence of many unannotated genes. For protein coding genes, small coding sequences may not be identified by gene finders because of the conservative nature of prediction algorithms. In this study, we identified small open reading frames (sORFs) with high coding potential by a simple gene finding method (Coding Ind...

متن کامل

Molecular characterization of apolipoprotein A-I from the skin mucosa of Cyprinus carpio

Apolipoprotein A-I is the most abundant protein in Cyprinus carpio plasma that plays an important role in lipid transport and protection of the skin by means of its antimicrobial activity. A 527 bp cDNA fragment encoding C terminus part of apoA-I from the skin mucosa of common carp was isolated using RT-PCR. After GenBank database searching, a partial sequence containing a coding sequence (CDS)...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 20 7  شماره 

صفحات  -

تاریخ انتشار 2004